-
Notifications
You must be signed in to change notification settings - Fork 1
Provide reproducible software environment deployment with GNU Guix #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These two files are all it takes to set up a known-good test environment with:
guix time-machine -C channels.scm -- shell -m manifest.scm
And even, the test environment can be containairized:
guix time-machine -C channels.scm \
-- shell -m manifest.scm --container \
-- python3 perform_full_rodrigo.py my_directory
GEERTS-reproduction/manifest.scm
Outdated
| "python-statsmodels" | ||
| "python-tqdm" | ||
|
|
||
| "gcc" ;should be gcc-toolchain but trick to have the default version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the comment a bit more explicit/clear? I have no idea what "trick to have the default version" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, Guix uses the latest version available in the Guix revision. Consider,
$ guix time-machine --commit=14c03807ba4bc81d42cf869f5b827f7da54ff843 \
-- show gcc-toolchain | recsel -C -p version
version: 12.3.0
version: 11.3.0
version: 10.4.0
version: 9.5.0
version: 8.5.0
version: 7.5.0
version: 6.5.0
version: 5.5.0
version: 4.9.4
version: 4.8.5
and so writing "gcc-toolchain" in the manifest.scm file would lead to have the version 12.3.0 in the computational environment. However, for this Guix revision, the default GCC toolchain for compiling all is instead version 11.3.0. This should not be an issue though. Well, I wanted to have the same GCC toolchain, as for instance the one used to compile python , to compile some code of the project.
The package gcc is an alias to gcc-toolchain at the default GCC toolchain, here 11.3.0.
Other said, instead of "gcc", I could have written "[email protected]" but because I have been lazy to find this number @11.3.0, I used this trick with the alias gcc. :-)
In short, I wanted to avoid Guix bug#60200. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Unfortunately I am still confused, even though now it's on a higher level :-)
My understanding until today was that "gcc" is a hidden package, used internally (in particular in the GNU build system), but not meant to be used as part of a profile or environment. Now I learn that it's a shorthand for the default version of "gcc-toolchain".
Is that documented somewhere? If so, the comment here should point to that documentation.
Is this special behavior for the Guile functions used in manifest.scm? At the command line, there definitely is no package "gcc" (just checked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it seems out of this PR. ;-)
My initial comment in the manifest.scm file was just about a quick reminder for people giving a look. Now, I have spent more time than just check the default version and instead just write "gcc-toolchain@11". :-)
Let complete from where my initial comment comes from. :-)
The symbol gcc is defined in the module (gnu packages gcc)
;; Note: When changing the default gcc version, update
;; the gcc-toolchain-* definitions.
(define-public gcc gcc-11)
and indeed it is an hidden-package. You need to follow all the inheritances from 4.7.4. Therefore, the package named gcc is not available at the command line.
$ guix show gcc
guix show: error: gcc: package not found
However, because a lot of people are expecting the name gcc and not the name gcc-toolchain, they were confused by the error when typing guix shell gcc or guix install gcc. Therefore, an alias had been introduced,
$ guix shell gcc
guix shell: package 'gcc' has been superseded by 'gcc-toolchain'
Well, for the curious, it reads from the module (gnu packages commencement):
(define-public gcc-toolchain-aka-gcc
;; It's natural for users to try "guix install gcc". This package
;; automatically "redirects" them to 'gcc-toolchain'.
(deprecated-package "gcc" gcc-toolchain-11))
Note that this alias is only defined with the default GCC toolchain. Other said, if all is fine and there is no bug, when typing "gcc" , then one gets gcc-toolchain at the default version (here 11.3.0). And it avoids the bug about the latest version I explained before.
To be clear, when typing guix show gcc, the term gcc is mapped to the exposed package names and so none is found because the packages with the name field corresponding to "gcc" are hidden. On the other hand, when typing guix shell gcc, the term gcc is mapped to the exposed package names extended by the deprecated package names, hence the match. Consider we would have (deprecated-package "kikoo" gcc-toolchain-11) then guix shell kikoo would return the computational environment with GCC toolchain at version 11.3.0. Does it make sense?
All that said, the current consensus is to use the regular name for the default version (and the other older available ones) and to append -next to the newer versions (than the default one). For example, emacs-next, python-cython-next, ghc-next, guile-next, etc. And the only exception I know is about GCC.
To be honest, I do not really understand why this exception is not fixed. Probably because it breaks the workflow of some early adopter HPC folks. :-)
My opinion is something along Lars's proposal in Guix bug#60200: add a property saying default for the packages when several variants are available. It would ease all the dance: rid of -next and make clearer the GCC case. And as often, not enough energy is putting in to explain and reach some consensus. :-)
Hope that helps,
simon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting "gcc-toolchain@11" into manifest.scm definitely solves any issues I might have with this pull request. But so far this change hasn't made it to this repository!
Also, thanks for the explanation of the double meaning of gcc. I am not at all convinced this is a good idea: as your examples illustrate, gcc is now interpreted differently by different guix subcommands, which I think is a source of confusion. But that's definitely not related to this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed the manifest with gcc-toolchain@11.
These two files are all it takes to set up a known-good test environment with:
guix time-machine -C channels.scm -- shell -m manifest.scm
|
Using the branch - name: Create artifact
run: |
mkdir -p artifacts
cd sources
cp $(guix time-machine -C channels.scm -- pack -f docker --save-provenance -m manifest.scm) ../artifacts/docker-load-sources.tar.gz
cd ../GEERTS-reproduction
cp $(guix time-machine -C channels.scm -- pack -f docker --save-provenance -m manifest.scm) ../artifacts/docker-load-GEERTS.tar.gz
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: docker-images
path: artifacts/Somehow, it uses GitHub action for generating artifacts: reproducible Docker pack produced by Guix. This allows non-Guix user to download the Docker image and just run it. And because the Docker image is built via |
| "python-matplotlib" | ||
| "python-networkx" | ||
| "python-numpy" | ||
| "python-pandas" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excuse my ignorance, but if i understand correctly the version of python-pandas that will be installed would be 1.4.4 (i.e., the version corresponding to guix@14c03807ba4bc81d42cf869f5b827f7da54ff843) whereas the version in requirements.txt is 1.4.1. is this difference (and possibly other such differences) intended?
Edit: based on discussion on c-torre/replication-recanatesi-2015#1 it seems that I may have been mistaken: the intent here may not be to reproduce the replicated research, but rather to ensure that the original research is replicable in a reproducible way. Apologies for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm the intent: ensure that the replication is reproducible. :-)
Moreover, please note that version label as 1.4.4 or 1.4.1 are not enough for having real reproduction. Because they identify only the source and not all the details for producing all the binary artefacts. Where is encoded the GCC version required for compiling CPython in requirements.txt or environment.yml for example?
Therefore, if you see a difference in the results (figures, plots, etc.) with the same 1.4.4 label then what is the source of the difference? An analysis flaw or something with some dependencies of dependencies, as build options or else, etc.
Last, please not that the identification of source code using string label is not enough. We need a content-dependent identifier. String label can be corrupted (e.g., mutable Git tag) when content-dependent identifier cannot (e.g., immutable Git hash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm the intent: ensure that the replication is reproducible.
Thank you.
Moreover, please note that version label as 1.4.4 or 1.4.1 are not enough for having real reproduction
Agreed.
Last, please not that the identification of source code using string label is not enough. We need a content-dependent identifier.
I see; yes, agreed.
Thank you!
Hi
As part of a Reproducible Research hackathon, we're looking into taking advantage of Guix to support reproducible deployment for submissions to ReScience C.
This pulls requests adds two files (four indeed, 2 for the main code and 2 for the directory
GEERTS-reproduction) that let us deploy the software environment of this computational experiment in a reproducible fashion; these two files are the Guix configuration, somehow replacing the filesenvironment.ymlorrequirements.txt. I tested it on my local machine; all the packages are provided by Guix. I also added a GitHub action to build the software environment upon push. However, the computations are too long for being run there.Well, I obtain some figures. Since I am not an expert, I am not able to say if they look similar or not, neither if I run the code with the right parameters.
Hope that helps.
Cc: @rougier @khinsen